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Abstract 



Influenza virus evolves to escape from immune system antibodies that bind to it. We used 
free energy calculations with Einstein crystals as reference states to calculate the difference 



of antibody binding free energy (AAG) induced by amino acid substitution at each position in 



epitope B of the H3N2 influenza hemagglutinin, the key target for antibody. A substitution 
with positive AAG value decreases the antibody binding constant. On average an uncharged to 
charged amino acid substitution generates the highest AAG values. Also on average, substitu- 
tions between small amino acids generate AAG values near to zero. The 21 sites in epitope B 
have varying expected free energy differences for a random substitution. Historical amino acid 
substitutions in epitope B for the A/Aichi/2/1968 strain of influenza A show that most fixed 
and temporarily circulating substitutions generate positive AAG values. We propose that the 
observed pattern of H3N2 virus evolution is affected by the free energy landscape, the map- 
ping from the free energy landscape to virus fitness landscape, and random genetic drift of the 
virus. Monte Carlo simulations of virus evolution are presented to support this view. 
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1 Introduction 



Influenza A virus causes annual global epidemics resulting in 5-15% of the population being in- 
fected, 3-5 million severe cases, and 250,000-500,000 fatalities. 1 The subtype of influenza A is 
determined by two surface glycoproteins — hemagglutinin (H) and neuraminidase (N). The H3N2 
virus has been one of the dominant circulating subtypes since its emergence in 1968. The antibod- 
ies IgG and IgA are the major components of the immune system that control influenza infection, 
binding to the influenza hemagglutinin. 2 There are five epitopes at the antibody binding sites on 
the top of H3 hemagglutinin, namely epitopes A-E. The epitope bound most prolifically by anti- 
body is defined as the dominant epitope, and it is central to the process of virus neutralization by 
antibody and virus escape substitution. 3 The cellular immune system, on the other hand, plays a 
relatively less recognized role in handling the invasive influenza virus. 2 The cellular system along 
with the innate immune system exerts a somewhat more homogeneous immune reaction against 
genetically distinct influenza strains. 2 ' 

Vaccination is currently the primary method to prevent and control an influenza epidemic in 
the human population. 1 Influenza vaccination raises the level of antibody specific for hemagglu- 
tinin and significantly enhances the binding affinity between antibody and hemagglutinin. Vaccine 
effectiveness depends on the antigenic distance between the hemagglutinin of the administered 
vaccine strain and that of the dominant circulating strain in the same season. 3 ' 5 Memory immune 
response from virus in previous seasons as well as vaccination in the current and previous seasons 
impose selective pressure on the current circulating virus to force it to evolve away from the virus 
strains recognized by memory antibodies that selectively bind to hemagglutinin. 

As a result of the immune pressure and the escape evolution of the influenza virus, which 
is largely substitution in the dominant epitope of hemagglutinin, the influenza vaccine must be 
redesigned and administered each year, and the vaccine effectiveness has been suboptimal in some 
flu seasons. 3 ' 6 The escape evolution in the dominant epitope is at a higher rate than that in the amino 
acid sites outside the dominant epitope. 7 Sites in the dominant epitope also show higher Shannon 
entropy of the 20 amino acids than do those outside the dominant epitope. 8 High substitution 
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rate and Shannon entropy in the dominant epitope of hemagglutinin suggest that the dominant 
epitope is under the strongest positive selection by human antibodies. The immune pressure against 
each genotype of the dominant epitope can be at least partially quantified by the binding constant 
between antibody and hemagglutinin. 

The H3N2 virus and human immune system in this work are simplified to be a system con- 
sisting of the H3 hemagglutinin and the corresponding human antibody. Exposure by infection or 
vaccination produces an affinity-matured antibody with the binding constant to the corresponding 
hemagglutinin equal to 10 6 -10 7 M _1 , while the binding constant of an antibody uncorrelated to 
the hemagglutinin is below 10 2 M -1 . 2 Escape substitutions may decrease the binding constant by 
changing the antibody binding free energy AG. Some substitutions decrease the antibody binding 
constant more than others and have higher probabilities to be fixed, because decrease in the anti- 
body binding constant is favorable to the virus. Here we define the difference of antibody binding 
free energy as AAG = AG42 — AG31 in which AG31 and AG42 are antibody- wildtype hemagglu- 
tinin binding free energy and antibody-evolved hemagglutinin binding free energy, respectively, as 
shown in Figure 1 . The fixation tendency of each substitution is a function of the difference of the 
antibody binding free energy 9 of the escape substitution. 

Epitope A or B of the H3N2 virus was dominant in most influenza seasons. 3 Epitope B of 
the H3N2 virus was the dominant epitope presenting more substitutions than any other epitope in 
the recent years. Epitope B was also dominant in 1968 when H3N2 virus emerged. Thus during 
these periods of time, the substitutions in epitope B directly affect the antibody binding constant 
and reflect the direction of the virus escape substitution. To attain a global view of the effects 
of substitutions in epitope B, it is necessary to compute a matrix containing the differences of 
antibody binding free energy caused by each possible single substitution in epitope B. There are 
21 amino acid sites in epitope B, and each residue in the wild type strain may substitute to any of 
the 19 different types to amino acid residues, hence we need to calculate a 19 x 21 matrix with 399 
elements. Such a matrix is a free energy landscape quantifying the immune selection over each 
evolved influenza strain. In this free energy landscape, the virus tends to evolve to a position with 
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low binding affinity of antibody to evade antibodies and reduce the immune pressure. Calculation 
of this landscape will enable us to study the mechanism of immune escape from a quantitative 
viewpoint, providing a criterion to describe and foresee the evolution of influenza virus. 

This paper is organized as follows: In Materials and Methods section, we describe the protocol 
for the free energy calculation and the system of hemagglutinin and antibody. In Results section, 
we present and analyze the calculated free energy landscape. The substitutions observed in history 
are also compared with the results of the calculation. In the Discussion section, a general pic- 
ture of H3N2 virus evolution under the selection pressure of the immune system is discussed and 
simulation results are discussed. Finally, our work is summarized in the Conclusion section. 

2 Materials and Methods 

2.1 Scheme of the Free Energy Calculation 

The expression of the binding constant K depends on the antibody binding free energy AG, K = 
exp (— AG/RT). The Boltzmann constants = 1.987 x 10~ 3 kcal/mol/K. The temperature is fixed 
to the normal human body temperature T = 310 K. Shown in Figure 1, one substitution in hemag- 
glutinin changes the antibody binding free energy from AG31 to AG42. The first and second sub- 
scripts define the end state and the starting state of the binding process, respectively. The ratio of 
the antibody binding constant after and before substitution is written as 

^=exp(-AAG/7?r) (1) 

where K\ and Kq are the antibody binding constant to substituted and wildtype hemagglutinin, 
respectively. 

The difference of the antibody binding free energy AAG = AG42 — AG31 = AG43 — AG21 is 
calculated by applying the Hess' Law to the thermodynamic cycle defined by State 1-4 in Figure 1. 
The processes corresponding to AG43 and AG21 are unphysical but more convenient to simulate. 
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We calculated AG21 and AG43 f° r eacn amino acid substitution in the unbound hemagglutinin and 
hemagglutinin bound by antibody, respectively. On the surface of the virus particle, hemagglutinin 
exists in the form of a trimer in which three monomers are encoded by the same virus gene. Thus 
we simultaneously substituted the amino acids in three hemagglutinin monomers in the trimer. 
The antibody has a Y-shaped structure with two heavy chains and two light chains. In the resolved 
structure (PDB code: 1KEN), the hemagglutinin trimer is bound by two Fab fragments. Thus, we 
incorporated the Fab dimer into the system for MD simulation. 

Using the software CHARMM, 10 we calculated each of AG21 and AG43 using thermodynamic 
integration. 11 We used molecular dynamics (MD) simulation to obtain the ensemble averages of 
the integrand from which each of AG21 and AG43 is calculated. The potential energy for the MD 
algorithm to sample the conformation space of the system is 

U (r, A) = (1 - A) t/ rea c (r) + At/ prod (r) (2) 

in which r is the coordinates of all the atoms, A is the variable of integration, U reac is the potential 
energy of the system corresponding to wildtype hemagglutinin, and £/ pro d is the potential energy 
of the system corresponding to substituted hemagglutinin. The value of AG21 or AG43 is 




The integrand (t/ pro d (r) — U ieac ( r ))x is the ensemble average with fixed A of potential energy dif- 
ference between the system after and before substitution. The interval of integration A G (0, 1) was 
equally divided into four subintervals in each of which a 16-point Gauss-Legendre quadrature was 
applied to numerically integrate the ensemble averages. The ensemble averages with 64 distinct 
A G (0, 1) were calculated by MD simulation with the potential energy defined in Eq. (2). 
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Figure 1: The scheme of the free energy calculation. The free energy difference of one substitution 
is calculated by AAG = AG43 — AG21. State n, n = 1-4, is the real system. State na has the 
same configuration of atoms as state n except that all the hydrogen atoms have mass 16.000 amu. 
Compared to state na, state nb contains one additional Einstein crystal of product atoms (n = 1,3) 
or reactant atoms (n = 2,4). The mass of hydrogen atoms in state nb is also 16.000 amu. Free 
energy AG2ib and AG43b are obtained by thermodynamic integration. 

2.2 Einstein Crystal 

We introduce the Einstein crystals to calculate the free energy of the reference state in the dual 
topology at both endpoints of the thermodynamic integration. To illustrate the function of the 
Einstein crystals, we analyze the free energy of the dual topology without Einstein crystals when 
A = as an example. We denoted by n\, n%, and no numbers of the reactant atoms, product 
atoms, and all the remaining atoms in the system, respectively. We denoted by r, r pro( iuct> and x the 
coordinates of the reactant atoms, product atoms, and all the remaining atoms in the system. The 
momenta of reactant atoms, product atoms, and the remaining atoms are denoted by p t j, p P)i -, and 
p X) i. The masses are similarly denoted by m r ,„ ra P;( -, and m x , ; . The Hamiltonian of the system with 
X =0is 

"0 n 2 . "In 2 . "2 n 2 . 

H = L + L T^- + £ + U »o W + (1 - *) Uno+m (x, r) + W nQ+ll2 (x, r product ) . (4) 

■ =1 Zffl x ,i i—i ^ m r,i i = i ^ m p,i 
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The partition function is 



e=n -sir n w rr 



y"(kdr exp[-j8C/ no (x) - j8C/ Wo+ni (x,r)] x ^ dr product 1 

=Greal X J] 



27Tm D A 3//2 



"2 

( =1 l"^ 

= 2real x 2product- (5) 

When A = 0, this partition function is the product of Q ma \, the partition function of the real system 
without product atoms, and <2 pr oduct> the partition function of the product atoms when X = 0. 

The free energy is given by — 1/j8 times the logarithm of the above partition function. The free 
energy is 

1 ^ 3 / 2nm vi \ 1 
C=G te , 1 - ? E-log(^j- r2 lo g V. (6) 

As shown in the above equation, the effect on the translational entropy from the product atoms is 
proportional to the logarithm of system size V. It diverges in the thermodynamic limit. This diver- 
gence exists, no matter what X scaling is performed. Note that we do not use the Einstein crystals 
to handle the translational entropy a ligand loses or gains when binding a flexible biomolecular 
receptor, which is taken into account by the thermodynamic cycle in Figure 1 . The translational 
entropy, proportional to log V in Eq. (6), is that of the dummy product atoms, not that of the bound 
or unbound complex. 

The value of G depends on the identity of the product atoms. Thus, the contribution to the ther- 
modynamic integration is different at the two endpoints, i.e. — kT log <2 re actant 7^ — ^log<2 product , 
in which <2 re actant is the partition function of the reactant atoms when X = 1 . Note also that the 
expression of the partition function contains the factor <2 pro duct for the product atoms. Relating the 
conventional expression for thermodynamic integration, Eq. (3), to AAG of Eq. (1) requires one to 
account for this term. This term arises from the use of a dual topology in CHARMM, and this term 
is typically ignored. While the contribution from the decoupled atoms is not constant, it can be 



exactly calculated if the restricted partition function over the decoupled atoms can be calculated. 
This calculation is what the Einstein crystal performs, using an Einstein crystal for the reference 
state rather the ideal gas in Eq. (4). 

In four 16-window thermodynamic integrations, the smallest variable of integration is X = 
1.32 x 10~ 3 . Since X is close to zero, product atoms in the system have potential energy near zero 
and behave as ideal gas atoms, with translational entropy proportional to the logarithm of system 
size, see Eq. (6). Exact calculation of the translational entropy terms of product atoms at X = 
by explicit dynamics seems difficult, because the translational entropy of the product atoms grows 
as the logarithm of the system size. These relatively free product atoms destabilize the system. 
This entropy divergence is a fundamental feature of the statistical mechanics, not a numerical arti- 
fact. Unrestrained product atoms induce large fluctuation of the Hamiltonian in the MD algorithm. 
These fluctuations increase the standard error of the quantity £/ pro d (r) — U KSLC (r), which is defined 
in Eq. (3) and is computed from the trajectory of the MD simulation. These fluctuations often 
cause the numerical integration algorithm in the MD simulation to be unstable. 12 In this case, the 
energy of the simulated system increases rapidly. This phenomenon causes CHARMM to termi- 
nate abnormally. The translational entropy introduced by the free atoms at X = and 1 affects the 
result. Reactant atoms cause the same problem near X = 1. 

We noticed that the non-linear scaling, i.e. using a high power of X such as the fourth power 
of X, in Eq. (2) 13 ' 14 did not work. The high power of the smallest X is extremely close to zero 
and the product atoms are almost free, which cause the MD simulation to terminate abnormally 
at several windows with small X. Additionally, the issue of translational entropy of reactant and 
product atoms needs to be addressed. Even when the MD algorithm with the non-linear scaling 
of A 13 ' 14 terminates and appears to have generated a converged simulation trajectory, this does 
not necessarily imply that the translational entropy of reactant or product atoms has been properly 
controlled. In fact, the X scaling approach may hide the entropy divergence at X = or X = 1 
by letting the algorithm terminate due to numerical roundoff error, rather than building statistical 
mechanical reference states for each of X = and X = 1 to account for or control the effect of 
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translational entropy. 

An alternative to X scaling introduces the soft-core potential as a way to turn off the poten- 
tial. 15 ' 16 The soft -core approach, like the lambda- scaling approach, does not address the translation 
entropy of the atoms at X = or X = 1 . Previous studies with non-constrained atoms at both end- 
points have been performed. 17-23 Besides the classical molecular dynamics with a non-ideal-gas 
reference state introduced into the dual topology, quantum molecular dynamics via metadynam- 
ics has been used to analyze a deamidation process. 24 Other applications of quantum molecular 
dynamics based free energy calculation include chorismate conversion to prephenate, 25 isomer- 
ization of glycine, 26 and histone lysine methylation. 27 As illustrated in Eq. (6), the translational 
entropy of the uncoupled atoms causes error in the final free energy results if it is not accounted 
for. 

One way to calculate the free energy change exactly is to use a non-ideal-gas reference state. 
This is quite natural, since the protein is not composed of ideal gas atoms. Deng and Roux in- 
troduced restraint potentials to confine the translational and rotational motion of a bound ligand 
to accelerate convergence of the simulation. 28 We use this idea to exactly include the contribution 
from the restrained states and built two Einstein crystals as the reference states for reactant and 
product atoms, respectively. Our calculation allows a theoretically exact determination of the free 
energy due to amino acid substitution. 

To handle these two difficulties at both endpoints of the integration in a theoretically exact way, 
we use two Einstein crystals as the reference states for reactant and product atoms, respectively. 
The Einstein crystal has been used as a reference state for free energy calculations. Frenkel and 
Ladd computed free energy of solids by building a path connecting the real solid and the reference 
Einstein crystal. 29 Noya et al. showed that a restrained Einstein crystal is a suitable reference in the 
free energy calculation of biomolecules. 30 The Einstein crystal, a solid state model, is consistent 
with the nature of antibody binding process in liquid phase. First, although the importance of 
biomolecular flexibility in protein-protein binding process is well-accepted, and is fully and exactly 
included in our calculation, we simply need to localize the product atoms when X = and the 
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reactant atoms when A = 1 . Moreover, we need to calculate the contribution to the free energy of 
these localized atoms. 

The choice of Einstein crystals as the reference states removes the singularity in thermody- 
namic integration in Eq. (3). As an example, an Einstein crystal was used as the reference state 
for the free energy calculation of hard-sphere fluid in order to remove the singularity in Eq. (3) at 
the end point X = 0. 31 In this example, the reference Einstein crystal was achieved by harmon- 
ically coupling the particles to their equilibrium positions and removing all interactions between 
particles. 32 

We here use Einstein crystals as the reference states to calculate the binding free energy change 
due to amino acid substitution. The Einstein crystal is a model for localized atoms. The free energy 
of the Einstein crystal can be exactly calculated. One Einstein crystal contains distinguishable and 
non-interacting atoms under harmonic constraints around reference positions fixed in space. In the 
Einstein crystal, the atom i with coordinates r, has potential energy 



in which r,- and r ; o are the actual and reference position of the atom, respectively, and Ki is the 
force constant of the harmonic constraint. We denote by m, the mass of atom /. The canonical 
partition function of an Einstein crystal is 



Ui(n) 



2 



(7) 



n - r i0 




(8) 



The spatial fluctuation of atom i in the Einstein crystal is 




(9) 
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In our system, we let the potential energy for MD simulation defined by Eq. (2) become 



U (r, X) = (1 - X) C/ re ac (r) + At/ prod (r) + A£/ e in,reac (r) + (1 - A) t/ e in,prod (r) . 



(10) 



Therefore reactant and product atoms are localized at both X = and A = 1 . The reference po- 
sitions of atoms in Einstein crystals are the equilibrium positions of corresponding reactant and 
product atoms. To minimize the numerical error during the thermodynamic integration calculation, 



(£4in,reac (r) - t/ r eac (r) ) x + (U pmd (r) - £/ e in, P rod (r) ) A ■ Minimization of the terms on the right hand 
size is approximately achieved by letting the average spatial fluctuation of each atom in Einstein 
crystals equal to that of the corresponding reactant or product atom, i.e. 



For each atom in the Einstein crystal, the force constant of harmonic constraint, K- eac or Kf , 
was calculated from the monitored fluctuations of the corresponding reactant or product atom with 
Eq. (11) or Eq. (12). In the scheme in Figure 1, the states with Einstein crystals are states lb, 2b, 
3b, and 4b. 

2.3 Modified Hydrogen Atoms 

The frequency of atom vibration depends on its mass. Hydrogen atoms generally have the highest 
vibration frequencies in the system. Such high frequencies require short time step in MD simula- 
tion and increase computational load. To limit vibration frequencies and allow a longer time step, 
one can apply the SHAKE algorithm to fix the length of any bond involving hydrogen atoms. 33 
The SHAKE algorithm decreases the degrees of freedom in the system by introducing additional 
constraints between atoms. Instead, we artificially changed the mass of hydrogen atoms from 1 .008 



we minimized the fluctuation of the integrand of thermodynamic integration (dU (r,A) /dX) x 




(ID 




(12) 
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to 16.000 amu in order to preserve degree of freedom in the system following the suggestion by 
Bennett. 34 A larger mass of hydrogen atoms allows a longer time step in the MD algorithm. Pomes 
and McCammon showed that changing the hydrogen mass to 10 amu allow using a 0.01 ps time 
step to simulate a system which consists of 215 TIP3P water molecules, smaller than our system. 35 
Feenstra et al. change the mass of hydrogen atoms to 4 amu to increase the simulation stability of 
a system which contains protein and water molecules and resembles our system. 36 We set the time 
step as 0.001 ps, a value widely used in simulations with physical masses for all atoms, to gain 
higher stability in the simulation of our large system with a hemagglutinin trimer, a Fab dimer, and 
water molecules. As with the Einstein crystals, we exactly calculated and subtracted off the con- 
tribution of the change to the hydrogen mass to AAG. Note that the modification of hydrogen mass 
is independent to the reference states in the simulation, which is selected to be Einstein crystals 
in this project. In fact, most of the hydrogen atoms in the system are neither reactant nor product 
atoms. In Figure 1, the states with Einstein crystals and modified hydrogen atoms are states la, 2a, 
3a, 4a, lb, 2b, 3b, and 4b. 

2.4 Expressions of Free Energies 

Introducing two Einstein crystals and heavier hydrogen atoms changes the potential energy in the 
system, as well as the canonical partition functions. After modification of hydrogen atoms, the 
mass of atoms changed from to m[ ■, from m P]i to m' p ■, or from m X]i to m' x ■. Canonical partition 
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functions of the states in Figure 1 are: 

xZ 3 (n + n h V,T) (13) 
1 «o /2W A 3/2 m /2W,\ 3/2 

^<* + »'-™=^n(-jn II ( ) 

Z 3 (n + m,V,r) (14) 



1 jo / 27Tml , \ 3/2 Ji / 2^m; , \ 3/2 

&(„„+„, + „ 2 ,v,r ) =^ J n^j 

/2tt\ 3w2 " 2 / m' ■ \ 3/2 

xz 3 («o+ni,v,r)^j d5) 

1 pf ( 2nm x i \ 3//2 J 2 / 2nm v ,• \ 3//2 

xZ 4 («o + «2,V,r) (16) 
1 Jo /2W A 3/2 "2/2 W A 3/2 

xZ 4 («o + «2,V,r) (17) 
1 jo /2^m' A 3/2 ^2 /2^:m' A 3/2 

f2Tt\ 3ni 111 ( m' \ 3 ^ 2 
xZ 4 (n + n 2 ,V,T)^-j ft ( J^E ) d«) 

in which the states are denoted by the subscripts. Contribution of the potential energy part of the 
Hamiltonian to the partition function is 

Z 3 (no + m,V,T) = |exp(- J 8t/„ 0+ „ 1 (r))dr (19) 
Z 4 (n + n 2 ,V,T) = f exp(-pU n()+n2 (r))dr (20) 
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AG 4 3b = -4 In 



(25) 



From the partition functions, free energies defined in Figure 1 are calculated: 

3 £° ( m > A 3 m ( m > A 

AG4> = 3 £ J<A 3 £ ]n KA 
2/3 £ ^m X;I y / 2/3 ; tt ymp. y 

AG 3b = — ^ In -^-£ ln — ^ (23) 

P WJ 2/3 £ \K? md J 

/ \ 3/2 

n£i(<-/^r c ) z 4 (n +n 2 ,y,r) 

r [iEi (m;,/<° d ) 3/2z3 ^ + " 1 ' y ' r ) 
The free energy between state 3 and 4 is 

i (2^/^) 3 -L- 1 (m p , ( ./^r d ) 3/2 i e E2 (n 2 ,y,r) 

AG43 = AG 43b - „ In ^ ^tj- = AG 43b - ^ In ^ 2 ' ' (26) 

in which <2ei and <2e2 are the partition functions of the Einstein crystals for product atoms and 
reactant atoms, respectively. The free energy AG 4 3b was calculated by thermodynamic integration 
while AG 4 3 was used to calculate the free energy difference of one substitution. Note that the cor- 
rection term between AG 4 3b and AG 4 3 is independent of the masses of atoms. Canonical partition 
functions as well as free energies of the state 1,1a, lb, 2, 2a, and 2b are calculated in a similar 
way. 

2.5 Implementation of Free Energy Calculation Algorithm 

The above discussion is the theoretical basis for the implementation of our free energy calculation 
algorithm. The free energy calculation protocol consists of four steps. First, we built the dual 
topology with reactant and product atoms in the amino acid substitution site in separated antibody 
and hemagglutinin or antibody-hemagglutinin complex. We then solvated the protein system and 
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modified the mass of hydrogen atoms. Second, two Einstein crystals were introduced as the ref- 
erence states for the reactant and product atoms, respectively. Third, the MD simulation was run 
at 64 windows. The thermodynamic integration algorithm obtained the free energy values AG21 
for separated antibody and hemagglutinin or AG43 for antibody-hemagglutinin complex, as in Fig- 
ure 1. This step gave the AAG value. Fourth, we calculated the error bar of the AAG value obtained 
in the last step. The technical details of these four steps are illustrated in the text below. Also 
described are the verification of the free energy calculation protocol, the software and hardware 
information, and the CPU hours consumed by the protocol. 

The hemagglutinin trimer of H3N2 virus strain A/Aichi/2/1968 with bound dimer antibody 
HC63 (PDB code: 1KEN) was used in our calculation. For each amino acid substitution, we 
built the dual topology with side chains of both amino acids prior to the simulation. Reactant 
and product atoms were defined as the side chains in the original and substituting amino acid, 
respectively. All the covalent and non-bonded interactions between reactant and product atoms 
were removed. The protein was in an explicit water box with periodic boundary condition. The 
mass of hydrogen atoms was changed from 1.008 to 16.000 amu. 

All the simulations were performed by CHARMM c33b2 with CHARMM22 force field. 10 
We first fixed the positions of hemagglutinin trimer except reactant atoms and minimized the sys- 
tem with 200 steps of steepest descent (SD) algorithm and 5000 steps of adopted basis Newton- 
Raphson (ABNR) algorithm. We ran a 5 ps MD simulation of the system, the trajectory of which 
gave the spatial fluctuation ^(<5r ; ) 2 ^ of each reactant atom. Then we fixed reactant atoms, released 
product atoms, and ran a 5 ps MD simulation to obtain the spatial fluctuation of each product atom. 
Final positions of both reactant and product were adopted as the reference positions of the corre- 
sponding Einstein crystal. The force constant K{ of each atom in Einstein crystals was obtained 
from ^(<5r ; ) 2 ^ by Eq. (11) and Eq. (12). With modified hydrogen atoms and two Einstein crystals 
as the reference states of reactant and product atoms, state lb, 2b, 3b, and 4b in Figure 1 were 
generated for thermodynamic integration. 

In thermodynamic integration, MD simulations were run at 64 windows with distinct A . In each 



15 



window, pressure of the system was first calibrated with a 10 ps MD simulation in an isothermal- 
isobaric (NPT) ensemble. The duration of 10 ps is appropriate because it is long enough to equi- 
librate the pressure and short enough to prevent the protein from drifting away from the original 
location. We fixed coordinates of the residues and water molecules except for those within 15 A 
from the three alpha carbons. Then we removed amino acid residues and water molecules other 
than those within 27.5 A from the three alpha carbons of substituted residues in the hemagglutinin 
trimer to reduce the system size, because the fixed atoms are not included in the topology of mov- 
able atoms and the cutoff of the non-bonded forces is 12 A. The Ewald sum was used to calculate 
charge interactions. Note that this substantial reduction of the system relies on the assumption 
that the free energy change due to the amino acid substitution is mostly affected by atoms near the 
binding site after the system reaches equilibrium. This assumption is based on two facts: the con- 
formations of hemagglutinin and antibody are stable once the system reaches equilibrium, and all 
the removed or fixed atoms have invariant interactions with the substituting amino acid residues. 
The stable protein conformation means amino acid residues far away from the substituting residue 
do not move during the amino acid substitution process. In the CHARMM22 force field used in 
this project, the cutoff of non-bonded force is 12 A and less than the 15 A threshold for system 
reduction. The system reduction does not directly affect the force on the substituted residue be- 
cause of absence of the long-range non-bonded force between the substituted residue and atoms 
removed from the system. This system reduction method was also applied to compute binding free 
energy of subtilisin, 37 of tripsin, 21 and of Src SH2 domain. 22 Robust results were obtained in all 
of these applications. Generally, this system reduction strategy can produce reliable result if the 
reduced system contains the residues and molecules critical to the binding process. 21 We note that 
the system reduction method could be a limitation of the free energy calculation model. The fixing 
of amino acid residues and water molecules described in section 2.5 substantially reduced the CPU 
time needed, but is an approximation to the real system containing the whole proteins. This lim- 
itation reflects the tradeoff between model accuracy and required computational resource. In the 
canonical ensemble, the new system was equilibrated for 200 ps and simulated for another 900 ps 
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as the data production phase. The integrand of thermodynamic integration is the ensemble average 
of the sampled trajectory (dU (r, X)/dX) x = <t/ e in,reac (r) - U ieac (r) - £/ e in,prod (r) + U pmd (r)) A . 
The free energy AG21 and AG43 between the real states was calculated by adding a correction 
term of the Einstein crystals in Eq. (26). Finally, the difference of antibody binding free energy is 
AAG = AG43-AG21. 

Error bars of AAG are also given. The convergence behavior of the simulation was analyzed 
using the block average method developed by Fly vbjerg and Petersen. 38 As mentioned above, the 
MD simulation for either the unbound hemagglutinin or the hemagglutinin- antibody complex con- 
tains 64 windows with distinct A. The 900 ps data production phase contains 9 x 10 5 simulation 
steps. The values A = U pro d (r) — t/ rea c (r), as in equation Eq. (3), computed in consecutive simula- 
tion steps were grouped into bins, and consecutive bins were merged progressively. The quantity 
<7 2 (A) I (n — 1), in which a 2 (A) is the variance of the average of each bin A\,Ai, ■ ■ . ,A„ and n is 
the number of bins, increases with the bin size and reaches a plateau when the bin size is 1 x 10 4 
steps. We fixed the bin size to 1 x 10 4 steps and estimate the variance of ensemble average (A) as 
a 2 (A) / (n — 1), following Fly vbjerg and Petersen's method. 38 

This protocol, without the Einstein crystal contribution, was verified by recalculating published 
free energy differences of amino acid substitution T131I. 9 Without the Einstein crystal contribu- 
tion, our protocol gave the AAG = 5.69 ±0.07 kcal/mol, compared to the AAG = 5.20 ±0.94 
kcal/mol in the published work. 9 Theoretically exact results presented here include the Einstein 
crystal contribution. We note that the theoretically exact AAG for T131I, including the Einstein 
crystal contribution, is 3.71 ±0.07 kcal/mol. 

The simulation was performed using CHARMM22 force field at three clusters: tg-steele.purdue.teragrid.org 
(Intel Xeon E5410, 2.33 GHz), sugar.rice.edu (Intel Xeon E5440, 2.83 GHz), and biou.rice.edu 
(IBM POWER7, 3.55 GHz), as well as at the condor pool tg-condor.rcac.purdue.edu at Purdue 
University. Simulation of each substitution took approximately 7.5 thousand CPU hours on aver- 
age, and so this work consumed about three million CPU hours. 
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3 Results 



3.1 Free Energy Landscape 

For each of the 21 amino acid sites in epitope B, we substituted from alanine to each one of the 
19 other amino acids, in which we used the neutral histidine (CHARMM code: Hse) as the model 
of histidine. The free energy difference and standard error of each substitution were calculated by 
the MD simulation (see Materials and Methods). The wildtype amino acid in each site of epitope 
B was extracted from the hemagglutinin sequence of the H3N2 strain A/Aichi/2/1968. The free 
energy difference and standard error of the substitution from the wildtype amino acid in each site 
were then calculated from the values for the change from the wildtype amino acid to alanine and 
from alanine to the new amino acid. The values are listed in Table 1. 

As described in Eq. (26), each AAG value listed in Table 1 contains the contribution of two 
Einstein crystals. The contribution of Einstein crystals to the final AAG values was calculated for 
each of the 399 amino acid substitutions in epitope B. The average fraction of the contribution of 
Einstein crystals in the calculated AAG values is 44%. The contribution of Einstein crystals is far 
greater than that of the statistical error of our free energy calculation in Table 1, which is 4.5% on 
average. Thus, the Einstein crystal contribution is both theoretically exact and practically impor- 
tant. In 371 of the 399 substitutions, the absolute values of the contribution of Einstein crystals 
is greater than 1.96 standard errors of the final AAG values. That is, the contribution of Einstein 
crystals is significant with p < 0.05 in 93.0% of all the amino acid substitutions. Consequently, 
it is essential to incorporate Einstein crystals in the free energy calculation to eliminate the error 
caused by the methods that neglect the unknown effect of the translational entropy of the free 
atoms in thermodynamic integration. The contribution of the translational entropy of ideal gas-like 
atoms (A = or X = 1) needs to be either calculated or removed by a theoretically exact method 
to perform an exact free energy calculation. 

The obtained AAG values allow us to analyze the character of each of the 20 amino acids. 
We first averaged over all the 21 amino acid sites in epitope B the AAG value caused by the 
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Table 1 : Summary of the calculated free energy differences AAG in each amino acid site in epitope 
B from the wildtype amino acid to all 20 amino acids. The standard errors are also listed. The free 
energy difference and its standard error of the substitution from the wildtype amino acid to itself 
are both zero. The units of free energy differences and their standard errors are kcal/mol. 



Ala -13.12±0,27 3.33±0.29 2.78±0.20 1,19±0.33 2.48±0.21 4.27±0.31 5.18±0,21 

Arg 22.57±0.46 2.31 ±0.45 16.98±0.37 0.08±0.50 -4. 19 ±0.44 -1.61 ±0.48 7.07±0.42 

Asp -4.80±0.36 5.83±0.42 -7.83±0.30 10.72±0.40 5.64±0.34 3.41±0.42 10.97±0.35 

Asp 4.52±0.38 19,12±0,42 16.28±0.32 11.06 ±0.42 9.95±0.37 18.37±0.40 15.34±0.36 



Cys -11.83 = 0.34 12.64 = 0.37 -2.37±0.30 5.32±0.38 -2.72±0.29 -7.88±0.40 7.92±0.32 



Gin 


-12.37 ±0.40 


7. 34 ±0.42 


-4.29±0.36 


13.14±0.41 


-0.45 ±0.36 


11.47 ±0.43 


6.54 ±0.40 


Glu 


11.15 ±0.38 


10.50 ±0.42 


17.77±0.34 


26.54 ±0.43 


4.68±0.36 


8.58±0.48 


5.19±0.39 


Gly 


-9.93 ±0.39 


0.00 ±0.00 


17.00±0.34 


0.11 ±0.44 


0.21 ±0.36 


0.00 ±0.00 


-4.19±0.41 


Hsc 


4.43 ±0.42 


0.15 ±0.43 


2.47 ±0.36 


-6.89 ±0.43 


12.18±0.38 


5.54±0.46 


1.06±0.39 


lie 


-16.03 ±0.41 


0.54 ±0.40 


1.55±0.33 


8.33±0.42 


11.22±0.37 


8.09±0.43 


18.96±0.39 


Leu 


-23.58±0.41 


-4.27 ±0.43 


-8.92±0.33 


2.64 ±0.45 


-6.26±0.39 


1.61 ±0.45 


4.08±0.38 


Lys 


3.57±0.45 


1 1. 18 ± 0.46 


14.58±0.37 


0.00 ±0.00 


6.24 ±0.40 


-1.60 ±0.48 


5.39±0.46 


Met 


-13.38±0.39 


-2.59 ±0.39 


1.23±0.35 


10.11 ±0.43 


16. 15 ±0.36 


14.49 ±0.44 


-6.38±0.37 


Phe 


-10.21 ±0.45 


6.12±0.43 


9.39 ±0.35 


0.30 ±0.45 


10.28±0.40 


5.17±0.48 


12.33 ±0.42 


Pro 


-9.36±0.36 


-2.43 ±0.42 


-1.86±0.31 


2.32±0.43 


5.69 ±0.30 


17.09 ±0.40 


6.08±0.36 


Ser 


-14.55 ±0.34 


3.36±0.37 


-1.09 ±0.29 


-1.45 ±0.38 


0.00 ±0.00 


2.76±0.39 


0.00 ±0.00 


Thr 


0.00 ±0.00 


7.35 ±0.36 


0.00 ±0.00 


-1.08±0.41 


6.34 ±0.32 


8.36±0.41 


15.32±0.32 


Trp 


9.82 ±0.47 


4.81 ±0.47 


19.84±0.43 


23.26 ±0.48 


16.14±0.45 


3.52±0.62 


-1.35±0.45 


Tyr 


-14.83 ±0.43 


2.72±0.42 


7.25±0.36 


-2.18±0.46 


-8.37±0.44 


18.42±0.51 


5.95 ±0.43 


Val 


-19.13±0.37 


3.56±0.38 


8.57±0.31 


-3.01 ±0.39 


7.63±0.32 


3.77±0.42 


6.45±0.32 



Positions 


160 


163 


165 


186 


187 


188 


189 


Ala 


4.16±0.22 


-0.24 ±0.22 


4.15±0.24 


-3.19±0.19 


-4.03 ±0.23 


3.45 ±0.25 


-9.01 ±0.28 


Arg 


9.70 ±0.44 


5.97±0.39 


14.58±0.41 


21.01 ±0.38 


8.12±0.42 


-0.06 ±0.45 


-0.39 ±0.48 


Asn 


2.07 ±0.34 


-2.32 ±0.32 


0.00 ±0.00 


4.67±0.30 


-10.07 ±0.34 


0.00 ±0.00 


-3.18 ±0.37 


Asp 


13.50±0.32 


12.64 ±0.32 


25.01 ±0.31 


24.54 ±0.28 


7.78±0.35 


19.77±0.37 


6.77±0.35 


Cys 


15.82±0.31 


1.84±0.30 


1.93 ±0.29 


-2.30±0.25 


-11. 09 ±0.32 


4.07±0.34 


6.23±0.33 


Gin 


3.04±0.39 


-8.29±0.35 


4.27 ±0.36 


5.16±0.33 


-2.87±0.37 


12.36±0.39 


0.00 ±0.00 


Glu 


15.48±0.36 


2.17±0.35 


15.74±0.34 


33.29±0.31 


14.41 ±0.35 


10.10±0.37 


12.16±0.39 


Gly 


1.22±0.39 


-5.83±0.38 


9.1 1 ±0.37 


0.13 ±0.27 


-0.60±0.30 


-5.06 ±0.32 


-5.69 ±0.32 


Hse 


0.52±0.38 


6.31 ±0.33 


7.44±0.33 


18. 15 ±0.30 


3.69±0.39 


-1.95 ±0.40 


-8.53±0.40 


He 


1.51 ±0.34 


10.62 ±0.36 


3.85±0.33 


-1.85±0.30 


-2.51 ±0.33 


-4.77 ±0.37 


3.65 ±0.37 


Leu 


-1.39±0.40 


3.85±0.35 


-9.20±0.37 


1.07±0.30 


-0.40±0.38 


-1.30 ±0.37 


-6.91 ±0.39 


Lys 


5.91 ±0.44 


10.37±0.38 


1.93±0.41 


-1.15±0.39 


24.91 ±0.41 


8.42 ±0.44 


9.48±0.64 


Met 


10.78±0.38 


7.22±0.35 


1.63±0.36 


13.06±0.33 


-5.11 ±0.36 


6.97±0.38 


6.86±0.40 


Phe 


7.90±0.41 


-0.86 ±0.36 


13.87±0.38 


6.94±0.33 


-7.23 ±0.39 


2.05 ±0.39 


4.37±0.43 


Pro 


4.51 ±0.32 


12.50 ±0.34 


18.96±0.33 


11. 82 ±0.29 


10.69±0.32 


-10.24±0.35 


-8.98±0.36 


Ser 


7.13±0.29 


9.07±0.30 


-0.92 ±0.28 


0.00 ±0.00 


-4.88±0.31 


8.09±0.33 


-5.09 ±0.34 


Thr 


0.00 ±0.00 


9.18±0.30 


10.35±0.31 


-14.79±0.27 


0.00 ±0.00 


3.53 ±0.38 


9.30±0.35 


Tip 


0.86 ±0.44 


12.34 ±0.35 


19.02±0.43 


-7.69±0.38 


-11. 04 ±0.48 


7.20 ±0.40 


-9. 19 ±0.45 


Tyr 


-5.43 ±0.39 


1.06±0.34 


14.76±0.37 


11. 90 ±0.33 


5.29 ±0.42 


1.57 ±0.40 


4.81 ±0.41 



Val 7.99±0.34 0.00±0.00 9.79±0.32 2.97±0.29 3.08±0.33 3.73±0.34 -7.89±0.36 



Positions 


190 


192 


193 


194 


196 


197 


198 


Ala 


-18.12±0.24 


-0.86 ±0.23 


-5.20±0.20 


2.37 ±0.23 


5.95±0.23 


-2.40 ±0.29 


0.00 ±0.00 


Arg 


4.97±0.41 


23.07 ±0.44 


32.33±0.41 


-13.66±0.37 


-25.38±0.44 


-17.94±0.47 


3.99±0.37 


Asn 


-16.44 ±0.30 


-2.56 ±0.32 


8.24 ±0.30 


—3.81 ±0.31 


13.27±0.36 


-6.58±0.38 


0.05 ±0.28 


Asp 


18.75 ±0.32 


2.92±0.32 


15. 29 ±0.29 


26.72 ±0.35 


9.25 ±0.34 


5.58±0.39 


5.17±0.24 


Cys 


-20.36 ±0.32 


-1.45±0.32 


-9.79±0.26 


1.91 ±0.30 


1.30±0.31 


6.70±0.36 


5.91 ±0.22 


Gin 


-17.37±0.37 


-6.00 ±0.37 


4.87 ±0.34 


-0.83 ±0.32 


7.68 ±0.36 


0.00 ±0.00 


1.41 ±0.31 


Glu 


0.00 ±0.00 


1.18±0.35 


45.40±0.34 


38.35±0.33 


3.60±0.36 


1 1 .34 ± 0.41 


2.37±0.30 


Gly 


-17.09 ±0.29 


-13.46±0.30 


-13.89±0.27 


-18.59±0.30 


8.08±0.31 


4.11 ±0.36 


3.65±0.28 


Hse 


-26.26 ±0.35 


-0.96 ±0.38 


-0.96±0.35 


9.95 ±0.34 


18.42 ±0.37 


-2.62 ±0.40 


-3.27±0.35 


He 


-16.45 ±0.37 


-5.57 ±0.37 


-3.80±0.31 


-6.91 ±0.32 


0.77 ±0.34 


1.23 ±0.41 


0.01 ±0.32 


Leu 


-17.27±0.36 


-7.97 ±0.37 


10.76±0.34 


0.00 ±0.00 


10.07±0.39 


-0.03 ±0.40 


-11.18±0.29 


Lys 


-9.33±0.38 


5. 67 ±0.42 


39.36±0.39 


-16.67±0.38 


0.49 ±0.40 


-16.50±0.47 


1.98±0.37 


Met 


-26.63 ±0.34 


6.82±0.36 


-2.91 ±0.32 


7.75 ±0.35 


4.08 ±0.37 


-7.79 ±0.40 


15.57±0.32 


Phe 


-31.89±0.39 


1 .56 ±0.40 


16.46±0.59 


2.78±0.34 


-1.99±0.37 


1.05 ±0.44 


8.73 ±0.34 


Pro 


-17.85±0.33 


-2.28 ±0.33 


9.84±0.31 


8.01 ±0.31 


15.42±0.35 


-5.34 ±0.40 


0.70 ±0.29 


Ser 


-14.75±0.31 


-7.79 ±0.30 


0.00 ±0.00 


6.62 ±0.29 


6.91 ±0.29 


1.97±0.36 


-2.40 ±0.22 


Thr 


-4.17 ±0.32 


0.00 ±0.00 


-2.04±0.27 


12.40 ±0.31 


7.81 ±0.33 


-7.91 ±0.36 


6.79 ±0.24 


Trp 


-22.93 ±0.39 


2.31 ±0.44 


17.92±0.42 


-1.30 ±0.40 


8.17±0.43 


-7.73±0.44 


-7.23±0.38 


Tyr 


-13.82±0.38 


7.63±0.42 


16.16±0.38 


9.73 ±0.36 


2.92 ±0.40 


6.10±0.44 


-4.82 ±0.32 


Val 


-9.12±0.31 


-6.80 ±0.32 


-6.92±0.30 


2.59 ±0.29 


0.00 ±0.00 


4.16±0.39 


-4.22 ±0.24 
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single substitutions from alanine to the other amino acids. The averaged AAG values are listed in 
Table 2. The largest AAG are caused by the negatively charged amino acids (Glu, Asp) and the 
positively charged amino acids (Arg, Lys), indicating that introduction of charged amino acids in 
the dominant epitope decreases the binding affinity between antibody and hemagglutinin. Note that 
amino acid substitutions that change the charge of hemagglutinin significantly affect the calculated 
free energy values. 39 " 41 The issue of how to best calculate free energy differences when charge 
changes has been debated over the years. In the present paper, we are using the standard Ewald 
approach with explicit solvent. We note that the evolutionary history of H3 hemagglutinin since 
1968 shows an increasing trend of the number of charged amino acids in epitope B, 42 which 
agrees with the results that introduction of charged amino facilitates virus evasion from antibody, 
as illustrated in Table 2. The result that introduction of charged amino acid on average increases 
AAG is not an artifact, is supported by data from the influenza evolution, and is expected on the 
basis that charge is hydrophilic. In addition to the charge, the rank of free energy differences also 
largely correlated to the size of amino acid. By the definition used by RasMol, 43 the 16 uncharged 
amino acids are tagged as hydrophobic (Ala, Gly, He, Leu, Met, Phe, Pro, Trp, Tyr, Val), large (Gin, 
Hse, He, Leu, Met, Phe, Trp, Tyr), medium (Asn, Cys, Pro, Thr, Val), and small (Ala, Gly, Ser), 
as shown in Table 2. The ranks of small amino acids are lower than those of medium amino acids 
(p = 0.036, Wilcoxon rank-sum test) and those of large amino acids (p = 0.085, Wilcoxon rank- 
sum test). In contrast, the hydrophobicity of the uncharged amino acids is largely uncorrelated to 
their ranks by AAG. As a result, charged amino acids in the dominant epitope are essential to the 
immune evasion while the virus escape substitution among small amino acids have minimal effect. 

Epitope B comprises 21 amino acid sites in the top of the hemagglutinin trimer. Taking the 
probability for one substituting amino acid to exist at each site to be proportional to the relative 
frequency of this amino acid in H3 hemagglutinin, the weighted average free energy difference in 
each of the 21 sites was calculated. The relative frequencies of 20 amino acids were obtained from 
6896 H3 hemagglutinin sequences deposited between 1968 and 2009 in the NCBI database 44 and 
listed in Table 2. Also using the AAG values in Table 1, we calculated and tabulated in Table 3 
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Table 2: The rank of the average binding free energy difference of the single substitution from 
alanine to another amino acid over all the 21 amino acid sites in epitope B of hemagglutinin trimer. 
The rank correlates with the charge and the size of amino acid, and it is relatively uncorrelated 
to the hydrophobicity. Here we applied classifications of RasMol for the biochemical properties 
of the 20 amino acids. 43 The relative frequencies of 20 amino acids were counted from the H3 
sequences in NCBI database from 1968 to 2009. 



Rank 


Amino Acid 


AAG (kcal/mol) 


Charged 


Hydrophobic 


Large 


Medium 


Small 


Relath 


1 


Glu 


14.612 ±0.061 


X 




X 






0.029 


2 


Asp 


14.533 ±0.055 


X 






X 




0.051 


3 


Arg 


6.018±0.078 


X 




X 






0.052 


4 


Lys 


5.766 ±0.078 


X 




X 






0.057 


5 


Tip 


4.458 ±0.081 




X 


X 






0.016 


6 


Tyr 


3.984±0.071 




X 


X 






0.035 


7 


Thr 


3.981 ±0.050 








X 




0.078 


8 


Pro 


3.912±0.054 




X 




X 




0.060 


9 


Met 


3.562 ±0.062 




X 


X 






0.009 


10 


Phe 


3.522 ±0.073 




X 


X 






0.030 


11 


Hse 


2.654 ±0.064 






X 






0.020 


12 


Gin 


1.985 ±0.063 






X 






0.042 


13 


He 


1.396 ±0.060 




X 


X 






0.070 


14 


Asn 


1.1 50 ±0.054 








X 




0.085 


15 


Val 


1.147 ±0.051 




X 




X 




0.055 


16 


Cys 


0.888 ±0.046 








X 




0.028 


17 


Ser 


0.469 ±0.044 










X 


0.096 


(18) 


(Ala) 


(0.000 ±0.000) 




X 






X 


0.046 


19 


Gly 


-1.612±0.055 




X 






X 


0.070 


20 


Leu 


-2.273 ±0.064 




X 


X 






0.071 



frequency 



for each site i the value of (AAG),-, which is the average AAG weighted by the probability for each 
different amino acid to be introduced, where probability is proportional to the relative frequencies 
of 20 amino acids counted from the H3 sequences in NCBI database from 1968 to 2009. 

As shown in Table 3, there is obvious variation among the expected free energy differences 
(AAG), caused by single substitutions at amino acid site i of epitope B. This variation is partly due 
to the wildtype amino acids in the sites. For instance, the wildtype amino acid in site 190 is Glu that 
has the highest rank in Table 2. As shown in Table 3, any amino acid substitution in site 190 tends 
to have a negative AAG. Another cause of variation in (AAG),- is that distinct sites affect differently 
the antibody binding process. Epitope B of the wildtype A/Aichi/2/1968 hemagglutinin sequence 
contains five sites with threonine: 128, 155, 160, 187, and 192. The mathematical expectancies 
(AAG),- in these five sites are -7.746, 4.471, 4.956, 1.182, and -1.737 kcal/mol, respectively. 
Therefore, each site in epitope B has a specific effect on the virus escape substitution. A random 
substitution in epitope B affects the antibody binding free energy differently depending on the site 
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Table 3: The rank of the average free energy difference (AAG) i generated by a substitution in each 
amino acid site i of epitope B. 



Rank 


Site 


(AAG) ; - (kcal/mol) 


1 


193 


8.074±0.081 


2 


159 


7.792 ±0.094 


3 


165 


7.741 ±0.086 


4 


158 


6.128±0.108 


5 


196 


5.444 ±0.088 


6 


160 


4.956 ±0.090 


7 


186 


4.754 ±0.076 


8 


163 


4.722 ±0.085 


9 


129 


4.690 ±0.103 


10 


155 


4.471 ±0.081 


11 


156 


4.029 ±0.106 


12 


157 


3.944 ±0.090 


13 


188 


2.945 ±0.092 


14 


194 


1.886±0.080 


15 


187 


1.182±0.087 


16 


198 


0.531 ±0.072 


17 


189 


-0.631 ±0.098 


18 


192 


-1.737 ±0.087 


19 


197 


-1.967 ±0.099 


20 


128 


-7.746 ±0.098 


21 


190 


-12.666 ±0.084 
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and the substituting amino acids. 

The variation of (AAG) ; is also reflected by the tertiary structure of the epitope B bound by 
the antibody. By looking into the structure of epitope B shown in Figure 2. Epitope B resides 
in two protruding loops from amino acid site 128 to 129, and from site 155 to 165, respectively, 
and in a oc-helix from site 186 to 198. Site 128 has a negative average free energy difference 
(AAG) 128 = —7.746 ±0.098 kcal/mol. All the other sites in these two loops show a positive 
(AAG) i value of a random substitution, with the minimum (AAG} 157 = 3.944 ±0.090 kcal/mol in 
site 157. The a-helix is located between hemagglutinin and antibody. In the a-helix, the sites 
facing towards the antibody usually present large positive (AAG) i values such as site 193 and 196, 
while the sites facing towards the hemagglutinin show lower (AAG) i such as site 189, 192, and 
197. Thus in the one dimensional sequence from site 186 to 198, the (AAG) i values oscillate with 
peaks and valleys corresponding to the sites in the a-helix facing alternatingly to the antibody and 
hemagglutinin. Consequently, the variation of the expected free energy changes in distinct sites 
depends on the structure of the hemagglutinin-antibody complex. 

3.2 Historical Substitutions in Epitope B 

The simulation results are supported in two aspects by amino acid sequence data of H3 hemagglu- 
tinin collected since 1968. These historical sequences are downloaded from the NCBI Influenza 
Virus Resource 45 and aligned. First, Pan et al. analyzed the number of charged amino acid in 
epitope B of H3 hemagglutinin in each year since 1968, and found an increasing trend of charged 
amino acids. 42 This finding supports the results that amino acid substitution introducing charged 
residues on average facilitates virus escape from antibody, as illustrated in Table 2. Second, amino 
acid substitutions in epitope B between 1968 and 1975 also verified the free energy calculation, as 
shown below. 

With the knowledge of the free energy landscape of the single substitutions, we are able to rec- 
ognize favorable single substitutions in epitope B. Substitutions with large positive AAG values en- 
able the virus to evade the immune pressure and increase the virus fitness. Favorable substitutions 
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Figure 2: The tertiary structure of the interface between the HA1 domain of H3 hemagglutinin 
monomer A/Aichi/2/1968 (bottom) and the antibody HC63 (top) (PDB code: 1KEN). Water 
molecules are not shown. Epitope B of the HA1 domain is located in two loops and one ct-helix 
with the color scale modulated according to the expected free energy difference (AAG),- of each 
site i in epitope B. The color scale ranges from red for the most negative (AAG) ,- values to blue for 
the most positive (AAG),- values. The sites i in epitope B with (AAG), near zero are colored white. 
The region outside epitope B is colored gray. The red site 128 is far from the antibody binding 
region and the red site 190 possessed the original amino acid Glu, which is a charged amino acid. 
It may explain why these two sites show negative (AAG),- with large absolute values. 
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grow in the virus population. Selection for substitutions with large AAG is part of the evolutionary 
strategy of the virus. The results of free energy calculation can also explain the substituted virus 
strains collected in history. 

We analyzed the hemagglutinin sequence information of H3N2 strains evolving from the A/Aichi/2/1968 
strains. H3 hemagglutinin circulating from 1968 to 1971 was mainly in the HK68 antigenic cluster 
while those circulating from 1972 to 1975 were mainly in the EN72 antigenic cluster. 46 Table 4 
shows that in the dominant epitope B, there were 17 substitutions occurred in 12 sites collected be- 
tween 1968 to 1975, 47 which contributed to the immune evasion and corresponding virus evolution 
from the HK68 cluster to the EN72 cluster. Also listed in Table 4 are the free energy differences of 
these historical substitutions. The 17 substituting amino acids have significantly higher ranks com- 
pared to the corresponding wildtype amino acids (p = 0.0044, Wilcoxon signed-rank test). This 
significant difference is expected because 15 of 17 substituting amino acids have ranks between 
1 and 10, while 10 of 12 wildtype amino acids in the substituted site have ranks between 1 1 and 
20. In all the 21 sites in epitope B, 15 of 21 wildtype amino acids have ranks between 11 and 
20. Additionally, the AAG values of these 17 substitutions listed in Table 4 are greater than the 
expected free energy differences (AAG),- in Table 3 of random substitutions in the 12 substituted 
sites (p = 0.013, Wilcoxon signed-rank test). 

We also looked into the historical escape substitutions in epitope B evading the immune pres- 
sure of the vaccine strains. For each influenza season, the amino acids in the administered vaccine 
strain were defined as the wildtype ones and those in the dominant circulating strain as the sub- 
stituting amino acids. In each of the 19 seasons in which H3N2 virus was the dominant subtype 
from 1971 to 2004, the substitutions in epitope B were located 3 and their AAG values were ob- 
tained from Table 1. As shown in Table 5, escape substitutions in epitope B as of 1973 mostly had 
positive AAG and generated substituting amino acids with increased rank (p = 0.047, Wilcoxon 
signed-rank test). Such tendency to introduce amino acids with higher ranks was not observed 
after 1973: the ranks of wildtype and substituting amino acids after 1973 present little significant 
difference (p = 0.28, Wilcoxon signed-rank test). The hemagglutinin of A/Aichi/2/1968 used in 
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Table 4: Substitutions occurred in epitope B of the hemagglutinin A/Aichi/2/1968 (H3N2) as of 
1975. Also listed are the time when the substitutions were observed, and the free energy differences 
with standard errors. In each site of epitope B, all the 20 amino acid were sorted in the descending 
order by the free energy differences introduced by a substitution from the wildtype amino acid to 
20 amino acids. The ranks of the substituting amino acid and the wildtype amino acid in each 
substituted site are listed in the column Rank (substituting) and Rank (WT), respectively. 



Substitution 


Year 


AAG (kcal/mol) 


Rank (substituting) 


Rank (WT) 


T128N 


1971 


-4.796 ±0.361 


8 


7 


T128I 


1975 


-16.026±0.412 


18 


7 


G129E 


1970, 1972 


10.500 ±0.415 


4 


17 


T155Y 


1972-1973, fixed in 1973 


7.254 ±0.358 


9 


14 


G158E 


1971-1972 


8.584 ±0.479 


6 


17 


S159N 


1971, 1974-1975 


10.969 ±0.352 


5 


17 


S159C 


1972 


7.923 ±0.324 


6 


17 


S159R 


1972 


7.065 ±0.424 


7 


17 


T160A 


1973 


4.160±0.217 


11 


18 


S186N 


1975 


4.673 ±0.298 


10 


14 


N188D 


1971-1973, fixed in 1973 


19.767 ±0.367 


1 


14 


Q189K 


1975 


9.484 ±0.640 


2 


10 


E190V 


1972 


-9.115±0.310 


5 


3 


E190D 


1975 


18.752±0.324 


1 


3 


S193N 


1972-1975 


8.239±0.301 


10 


12 


S193D 


1975 


15.285 ±0.294 


7 


12 


A198T 


1972 


6.793 ±0.236 


3 


14 
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the free energy calculating is in the HK68 antigenic cluster. Perhaps after the virus evolved into 
the next EN72 cluster, change in the virus antigenic character stimulates the immune system to 
produce new types of antibody other than the HC63 antibody used in the calculation. A different 
binding antibody changes the free energy landscape of the substitutions in epitope B. Thus the 
application of the present free energy landscape should be limited within the HK68 and EN72 
clusters. Free energy differences of substitutions in the EN72 cluster would need to be calculated 
using the updated antibody crystal structure. 

4 Discussion 

4.1 Fitness of the Virus Strains 

The free energy landscape shown in Table 1 gives the change of the antibody binding affinity, 
K\/Kq = exp(—AAG/RT), induced by each possible substitution in epitope B of the wildtype 
hemagglutinin. The majority of the substitutions lead to positive AAG, and yield a reduced binding 
affinity that is smaller than the binding affinity of the original mature antibody Kq. Decreased 
antibody binding constant grants the virus a higher chance of evading the immune pressure and 
infecting host cells. We propose that virus fitness is positively correlated to the free energy differ- 
ence AAG. The other factor affecting virus fitness is the capability of the hemagglutinin to maintain 
the normal biochemical functions, such as virus entry. Most sites in epitope B changed amino acid 
identities during 1968 to 2005 as the H3N2 virus kept circulating. 47 We therefore postulate that the 
substitutions in epitope B do not greatly interfere with the biochemical function of hemagglutinin, 
and virus fitness is dominantly determined by the free energy difference resulted from substitutions 
in epitope B. 

The binding constant between hemagglutinin and antibody after the first round of maturation 
is about 10 6 M , and the binding constant of an uncorrelated antibody is below 10 2 M . On 
average, four substitutions in epitope B change the substituted hemagglutinin sufficiently so that 
the immune response of the original antibody binding to epitope B is abrogated. 3 Since this is a 
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Table 5: Substitutions occurred in epitope B of H3 hemagglutinin between the vaccine strain and 
the dominant circulating strain in each season in which the H3N2 subtype was dominant. The 
free energy difference with standard error of each substitution is obtained using the free energy 
landscape in Table 1. The ranks of free energy differences sorted in the descending order are listed 
in column Rank (vaccine) and in column Rank (circulating) for the amino acids in the vaccine 
strain and the dominant circulating strain, respectively. 



Year 


Substitution 


AAG (kcal/mol) 


Rank (vaccine) 


Rank (circulating) 


1972 


T155Y 


7.254±0.358 


14 


9 


1972 


G158E 


8.584±0.479 


17 


6 


1972 


S159C 


7.923 ±0.324 


17 


6 


1972 


El 90 V 


—9.115 ±0.310 


3 


5 


1973 


T160A 


4.160±0.217 


18 


11 


1973 


N188D 


19.767 ±0.367 


14 


1 


1973 


S193N 


8.239 ±0.301 


12 


10 


1975 


S157L 


-6.256 ±0.394 


15 


19 


1975 


A160T 


-4.160±0.217 


11 


18 


1975 


Q189K 


9.484 ±0.640 


10 


2 


1975 


N193D 


7.046 ±0.317 


10 


7 


1984 


E156K 


—26.536 ±0.429 


1 


15 


1984 


V163A 


-0.243 ±0.217 


15 


16 


1984 


D190E 


-18.752±0.324 


1 


3 


1984 


I196V 


-0.768 ±0.343 


16 


18 


1987 


Y155H 


-4.782±0.414 


9 


11 


1987 


E188D 


9.669 ±0.382 


3 


1 


1987 


K189R 


-9.872 ±0.697 


2 


11 


1996 


V190D 


27.867 ±0.299 


5 


1 


1996 


LI 941 


-6.914 ±0.324 


13 


17 


1997 


K156Q 


13.140±0.413 


15 


3 


1997 


E158K 


-10.187±0.515 


6 


18 


1997 


V190D 


27.867 ±0.299 


5 


1 


1997 


LI 941 


-6.914 ±0.324 


13 


17 


1997 


V196A 


5.947 ±0.229 


18 


11 


2003 


H155T 


-2.472 ±0.355 


11 


14 


2003 


Q156H 


-20.028 ±0.365 


3 


20 


2003 


S186G 


0.132 ±0.275 


14 


13 
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reduction of the binding constant from roughly 10 6 M 1 to 10 2 M \ one amino acid substitution 
that contributes to immune escape causes on average a 10-fold decrease in antibody binding con- 
stant, or equivalently AAG cr i t = 1.42 kcal/mol at 310 K. Assuming the effect of immune evasion 
can be broken into the sum of individual amino acid substitutions in the dominant epitope, 3 we 
define the virus fitness w as the sum of the contribution in each site of epitope B 

w=A + £ Swi. (27) 

epitope B 

We denote by AAG ; ar the free energy difference to substitute amino acid a to amino acid y at site 
i. We investigated two versions of the virus fitness landscape. The first is to define 8wi as a linear 
function of the free energy difference of the substitution 

AAG ar 

8w i =A l — (28) 
AAG crit 

The second is to define <5w ; as a step function 

8wi = A 2 H (AAG" 7 - AAG cri t) (29) 

in which H is the Heaviside step function. Illustrated in the simulation below, either definition of 
fitness is sufficient to explain the observed immune evasion of the H3N2 virus. 



4.2 Selection in the Epitope 

Evolution of the H3N2 virus is driven jointly by neutral evolution and selection. 48 Neutral evolu- 
tion may be ongoing in sites outside the epitopes. The high substitution rate in epitope B suggests 
that selection is the major factor shaping the pattern of evolution in that epitope. 47 Shown in Ta- 
ble 4 and Table 5 are the historical substitutions. The significantly increased ranks of free energy 
differences suggests the existence of selection by the immune pressure for substitutions that have 
increased the free energy difference AAG and decreased the antibody binding constant. The im- 
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mune selection is directional: certain types of amino acids such as charged ones were initially 
more likely to be added into the epitope B 42 because they maximally decreased the antibody bind- 
ing constant as indicated in Table 2. The heterogeneity of the expected free energy difference of a 
random substitution in Table 3 shows that each site in epitope B has a specific weight with regard 
to immune escape. 

Table 4 also illustrates that the immune selection did not necessarily pick the amino acid with 
the highest rank of AAG as the substituting amino acid. Amino acids with moderate rank were 
introduced into epitope B even for the fixed substitution T155Y. Therefore the historical evolution 
did not simply substitute amino acids by maximizing the free energy differences in Table 1 . This 
phenomenon is possibly due to two causes. First, the virus fitness may be insensitive to the AAG 
values, e.g. A\ in Eq. (28) may be small, or amino acid substitutions with large AAG values 
may contribute equivalently to the fitness, as in Eq. (29). Second, only a small fraction of virus 
in one host is shed by the host and infects the next host, so the population size of propagated 
virus from one host is smaller by several orders of magnitude than the total virus population size 
in the same host. Additionally, a seasonal bottleneck exists in the influenza virus circulation. 49 
Both random mutation and small population sizes lead to dramatic randomness in the evolution. 
Consequently, the evolution of H3 hemagglutinin is not solely determined by maximizing the free 
energy differences in Table 1 and minimizing the antibody binding constant, even if the virus is 
under immune selection. Instead, randomness plays a key role in the H3N2 virus evolution. 

4.3 A Picture of the H3N2 Virus Evolution 

Selection depends on the fitness of each virus genotype that is quantified as a non-decreasing 
function of the free energy difference AAG. Moderate selection in epitope B requires that fitness 
improvement is limited when AAG is large. One possibility is that the ratio A\/Aq in Eq. (28) 
is small. Another is that the fitness takes the form of Eq. (29) in which all substitutions with 
AAG > AAG cr it have equal fitness. 

The virus evolution is also affected by the genetic drift. Genetic drift is a term which captures 
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the random component of evolution due to the large size of the phase space of possible substitutions 
relative to the single set of substitutions that lead to the highest viral fitness. The effect of genetic 
drift is quantitatively reflected in the fixation process of a new strain, as shown in the simulation 
below. A narrow bottleneck of virus propagation allows only a small fraction of the progeny to 
survive, imposing a notable probability that a favorable substitution is lost in the next generation. 
The effect of genetic drift is to increase the randomness in the virus evolution so that observed 
substitutions are based on chance in addition to the fitness of these substitutions. 

To model the H3N2 evolution discussed above, we ran two Monte Carlo simulations of the 
influenza evolution model. A population of N sequences of epitope B with 21 sites were created 
and initialized as the wildtype A/Aichi/2/1968 sequence. Here N = 10 3 to account for a narrow 
genetic bottleneck of hemagglutinin and for tractability of the simulation. We iterated the simula- 
tion program for 5,000 generations or about five years to recreate a pattern of evolution similar to 
that in history and shown in Table 4. The random substitution rate of H3 hemagglutinin is roughly 
4.5 x 10~ 6 amino acid substitution/site/generation. 50 We let the number of substitutions follow a 
Poisson distribution with mean X = 21 x 4.5 x 10~ 6 ./V = 9.5 x 10~ 5 N and randomly assigned the 
substitution sites. The substituting amino acid at each substitution site was randomly picked from 
the remaining 19 amino acids proportional to the historical frequencies observed in hemagglutinin. 
The fitness w in the first simulation was calculated for each sequence using Eq. (28) with Aq = 100 
and A\ = 3 and that in the second simulation was calculated for each sequence using Eq. (29) 
with Aq = 100, A2 = 9, and AAG cr i t = 1.42 kcal/mol. Note that by choosing A\ = 3 for the first 
simulation, a random substitution causes the expected fitness to change from 100 to 104.9, and 
by choosing A2 = 9 for the second simulation, a random substitution changes the expected fitness 
from 100 to 105.0. The size of the progeny of each sequences equals the fitness w of the sequence 
if w > 0, and equals if w < 0. The next generation of sequences was initialized by randomly 
sampling N sequences from the progeny sequences. 

The results of both simulations showed remarkable similarity to the observed substitutions in 
Table 4 with the bottleneck N equal to 10 3 . See Figure 3 and Figure 4. Amino acid substitutions 
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generated in the simulation are usually distinct with those in Table 4 observed in history. The 
AAG values of each substitution emerging in the simulation are nevertheless similar to those of 
the historical substitutions listed in Table 4. As was observed in history in Table 4, most of the 
substituted strains in the simulations with relative frequency greater than 1% have positive AAG 
values with the ranks of the substituting amino acids ranging from 1 to 10. The fixation of a newly 
emerged substitution takes about 1,000 generations or one year on average. Fixed substitutions 
mostly introduce amino acids with positive AAG values in Table 1 and higher ranks in Table 2, 
and several of these fixed substitutions in simulation, such as E190D and N188D, have the highest 
AAG values in the current site. However, fixed substitutions in the simulation are not always 
the substitutions with the highest AAG values in Table 1. These observations suggest that the 
Monte Carlo simulation considering the effect of substitution, selection, and genetic drift is able 
to reproduce the pattern of evolution observed in history. This simulation also shows that besides 
the free energy difference of each substitution, the mapping from the free energy landscape to the 
fitness landscape as well as the random genetic drift are dominant factors of the evolution in virus 
epitopes. 

Shown in Figure 3 and Figure 4 for both simulations are the trajectories of relative frequencies 
of substituting amino acids. The trajectories are similar to historical observations of human H3N2 
virus data. 47 For influenza, 1000 generations roughly equal one year. The two substitutions T 155 Y 
and N188D were fixed in epitope B in 1968-1973. As indicated by Figure 3 and Figure 4, substitu- 
tion T155Y emerged between generation 3000 and 4000, or equivalently between 1971 and 1972 
from the emergence of the H3N2 virus in 1968. 47 Substitution T155Y was fixed between gener- 
ation 4000 and 5000. Similarly, substitution N188D emerged between generation 2000 and 3000 
and was fixed between generation 4000 and 5000. The first simulation in which virus fitness is cal- 
culated using Eq. (28) generated two fixed substitution, G129A that emerged at generation 4000 
and was fixed by generation 5000, and E190D that emerged at generation 3600 and was fixed by 
generation 3900. The second simulation using Eq. (29) generated one fixed substitutions, V196D 
emerging at generation 2900 and fixed by generation 5000, and one substitution that nearly fixed, 
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N188D emerging at generation 4100 and acquiring the relative frequency 0.84 at generation 5000. 
The trajectories in both simulations resemble those of substitutions T155Y and N188D observed 
in history. From these results, the two Monte Carlo simulations appear to capture the main factors 
of immune selection and genetic drift in evolution of the H3N2 virus. 

4.4 Multiple Substitutions 

In this work, we calculated the free energy difference for each possible substitution in epitope B. 
The free energy calculation for multiple substitutions is intractable using the current technology 
due to the combinatorially increasing calculation load for multiple substitutions. The issue of 
multiple substitutions is here addressed by assuming that the effect of immune evasion is well 
represented by the sum of the contribution in each substituted site in epitope B. Data indicate the 
independence of the immune evasion effect of the sites in epitope B. 3 We may, thus, assume that 
the free energy difference of the multiple substitution is the sum of the individual AAG values 
available in Table 1 plus a minor correction term. 

4.5 Prediction of Future Virus Evolution 

The result of this work quantifies the reduction of the binding constant of antibody to virus for 
substitutions in epitope B with larger AAG values and higher ranks of substituting amino acids. A 
newly emerging virus strain with larger antibody binding free energy difference has a greater prob- 
ability to become the dominant strain in the next flu season. Note that due to random fluctuations 
in the large phase space of possible substitutions, actual trajectories deviate from the trajectory 
determined by choosing sites and substituting amino acids with greatest free energy differences. 
With a three dimensional structure of hemagglutinin of the current circulating virus and binding 
antibody, one is able to calculate the free energy landscape for all the possible single substitutions 
in the dominant epitope and estimate the a priori escape probabilities in the next season. The dom- 
inant circulating influenza strain usually possesses amino acid substitutions from the vaccine strain 
against which memory antibodies are generated. Usually these substitutions disrupt the antibody 
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Figure 3: Two fixed substitutions G129A and E190D generated by Monte Carlo simulation of 
epitope B using Eq. (28). Also plotted are two historical fixed substitutions in epitope B: T155Y 
fixed between 1971 and 1973, and N188D fixed between 1970 and 1973. The frequency data 
of historical substitutions are from Shih et al. 47 The origin of time axis is 1968. One thousand 
generation of the H3N2 virus is approximately one year. Figure 3(a) Substitution G129A causing 
the free energy difference AAG = 3.33 ± 0.29 kcal/mol is fixed by the simulation. The rank of 
the free energy difference of G129A is 12 in 19 possible substitutions in site 129. Figure 3(b) 
Substitution E190D with AAG = 18.75 ±0.32 kcal/mol. The rank is 1 in 19 possible substitutions 
in site 190. 
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Figure 4: Two fixed substitutions N188D and V196D generated by Monte Carlo simulation of 
epitope B using Eq. (29). Two historical fixed substitutions T155Y and N188D are also plotted, 
and data are from Shih et al. 47 Figure 4(a) Substitution N188D causing the free energy difference 
AAG = 19.77 ± 0.37 kcal/mol is fixed by the simulation. The rank of the free energy difference 
of N188D is 1 in 19 possible substitutions in site 188. Figure 4(b) Substitution V196D with 
AAG = 9.25 ±0.34 kcal/mol. The rank is 5 in 19 possible substitutions in site 196. The proportions 
of substituting amino acids are represented by different line types. 
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binding process by decreasing the binding constant, as shown in Table 5. Thus one can predict 
vaccine effectiveness by evaluating the antibody binding constant against the dominant circulat- 
ing strain, which is acquired by calculating free energy difference of the amino acid substitutions 
between the vaccine strain and the dominant circulating strain. 3 More accurate predictions of evo- 
lutionary pattern of virus as well as epidemiological data such as vaccine effectiveness may be 
obtained by optimally mapping the free energy landscape to the fitness landscape and taking into 
account random factors such as genetic drift in the evolution process. 

5 Conclusion 

We introduced the Einstein crystal as a technology to improve the results of free energy calcula- 
tion. By calculating the free energy difference of each amino acid substitution, we obtained the 
free energy landscape for substitutions in epitope B of hemagglutinin. There is notable variation 
between the values of free energy differences of different substitutions at different sites, because 
the identities of original and substituting amino acids, as well as the locations of amino acid sub- 
stitutions, affect to differing degrees the antibody binding process. In this free energy landscape, 
we suggest that virus tends to evolve to higher AAG values to escape binding of antibody. Coun- 
terbalancing this selection is random drift. Historical amino acid substitutions in epitope B and 
Monte Carlo simulations of the virus evolution using the free energy based virus fitness, in which 
random genetic drift of the virus adds statistical noise into the virus evolution process, showed that 
selected substitutions are biased to those with positive AAG values. 
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